16 research outputs found

    Slowness learning for curiosity-driven agents

    Get PDF
    In the absence of external guidance, how can a robot learn to map the many raw pixels of high-dimensional visual inputs to useful action sequences? I study methods that achieve this by making robots self-motivated (curious) to continually build compact representations of sensory inputs that encode different aspects of the changing environment. Previous curiosity-based agents acquired skills by associating intrinsic rewards with world model improvements, and used reinforcement learning (RL) to learn how to get these intrinsic rewards. But unlike in previous implementations, I consider streams of high-dimensional visual inputs, where the world model is a set of compact low-dimensional representations of the high-dimensional inputs. To learn these representations, I use the slowness learning principle, which states that the underlying causes of the changing sensory inputs vary on a much slower time scale than the observed sensory inputs. The representations learned through the slowness learning principle are called slow features (SFs). Slow features have been shown to be useful for RL, since they capture the underlying transition process by extracting spatio-temporal regularities in the raw sensory inputs. However, existing techniques that learn slow features are not readily applicable to curiosity-driven online learning agents, as they estimate computationally expensive covariance matrices from the data via batch processing. The first contribution called the incremental SFA (IncSFA), is a low-complexity, online algorithm that extracts slow features without storing any input data or estimating costly covariance matrices, thereby making it suitable to be used for several online learning applications. However, IncSFA gradually forgets previously learned representations whenever the statistics of the input change. In open-ended online learning, it becomes essential to store learned representations to avoid re- learning previously learned inputs. The second contribution is an online active modular IncSFA algorithm called the curiosity-driven modular incremental slow feature analysis (Curious Dr. MISFA). Curious Dr. MISFA addresses the forgetting problem faced by IncSFA and learns expert slow feature abstractions in order from least to most costly, with theoretical guarantees. The third contribution uses the Curious Dr. MISFA algorithm in a continual curiosity-driven skill acquisition framework that enables robots to acquire, store, and re-use both abstractions and skills in an online and continual manner. I provide (a) a formal analysis of the working of the proposed algorithms; (b) compare them to the existing methods; and (c) use the iCub humanoid robot to demonstrate their application in real-world environments. These contributions together demonstrate that the online implementations of slowness learning make it suitable for an open-ended curiosity-driven RL agent to acquire a repertoire of skills that map the many raw pixels of high-dimensional images to multiple sets of action sequences

    Event Tables for Efficient Experience Replay

    Full text link
    Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches

    AutoIncSFA and vision-based developmental learning for humanoid robots

    Full text link
    Abstract—Humanoids have to deal with novel, unsupervised high-dimensional visual input streams. Our new method Au-toIncSFA learns to compactly represent such complex sensory input sequences by very few meaningful features corresponding to high-level spatio-temporal abstractions, such as: a person is approaching me, or: an object was toppled. We explain the advantages of AutoIncSFA over previous related methods, and show that the compact codes greatly facilitate the task of a reinforcement learner driving the humanoid to actively explore its world like a playing baby, maximizing intrinsic curiosity reward signals for reaching states corresponding to previously unpredicted AutoIncSFA features. I

    Perceptual abstraction and attention

    Get PDF
    This is a report on the preliminary achievements of WP4 of the IM-CleVeR project on abstraction for cumulative learning, in particular directed to: (1) producing algorithms to develop abstraction features under top-down action influence; (2) algorithms for supporting detection of change in motion pictures; (3) developing attention and vergence control on the basis of locally computed rewards; (4) searching abstract representations suitable for the LCAS framework; (5) developing predictors based on information theory to support novelty detection. The report is organized around these 5 tasks that are part of WP4. We provide a synthetic description of the work done for each task by the partners

    Detection and Avoidance of Semi-Transparent Obstacles using a Collective-Reward Based Approach

    No full text
    International audienceMost of the computer and robot-vision algorithms are designed mainly for opaque objects and non-opaque objects have received less attention, in spite of them being omnipresent in man-made environments. With an increasing usage of such objects, especially those made of glass, plastic etc., it becomes necessarily important to detect this class of objects while building a robot navigation system. Obstacle avoidance forms a primary yet challenging task in mobile robot navigation. The main objective of this paper is to present an algorithm to detect and avoid obstacles that are made of semi-transparent materials, such as plastic or glass. The algorithm makes use of a technique called the collective-reward based approach to detect such objects from single images captured by an uncalibrated camera in a live video stream. Random selection techniques are incorporated in the method to make the algorithm run in real-time. A mobile robot then uses the information after detection to perform an obstacle avoidance maneuver. Experiments were conducted on a real robot to test the efficacy of the algorithm
    corecore